theoretical computer science
Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
Zhang, Terry Jingchen, Jiang, Wenyuan, Liu, Rongchuan, Wang, Yisong, Yang, Junran, Wang, Ning, Ni, Nicole, Huang, Yinya, Sachan, Mrinmaya
Formal theorem proving (FTP) has emerged as a critical foundation for evaluating the reasoning capabilities of large language models, enabling automated verification of mathematical proofs at scale. However, progress has been constrained by limited datasets due to the high cost of manual curation and the scarcity of challenging problems with verified formal-informal correspondences. We propose leveraging theoretical computer science (TCS) as a scalable source of rigorous proof problems, where algorithmic definitions enable automated generation of arbitrarily many challenging theorem-proof pairs. We demonstrate this approach on two TCS domains: Busy Beaver problems, which involve proving bounds on Turing machine halting behavior, and Mixed Boolean Arithmetic problems, which combine logical and arithmetic reasoning. Our framework automatically synthesizes problems with parallel formal (Lean4) and informal (Markdown) specifications, creating a scalable pipeline for generating verified proof challenges. Evaluation on frontier models reveals substantial gaps in automated theorem proving: while DeepSeekProver-V2-671B achieves 57.5\% success on Busy Beaver problems, it manages only 12\% on Mixed Boolean Arithmetic problems. These results highlight the difficulty of long-form proof generation even for problems that are computationally easy to verify, demonstrating the value of TCS domains for advancing automated reasoning research.
FPT-Approximability of Stable Matching Problems
Chen, Jiehua, Roy, Sanjukta, Simola, Sofia
We study parameterized approximability of three optimization problems related to stable matching: (1) Min-BP-SMI: Given a stable marriage instance and a number k, find a size-at-least-k matching that minimizes the number $β$ of blocking pairs; (2) Min-BP-SRI: Given a stable roommates instance, find a matching that minimizes the number $β$ of blocking pairs; (3) Max-SMTI: Given a stable marriage instance with preferences containing ties, find a maximum-size stable matching. The first two problems are known to be NP-hard to approximate to any constant factor and W[1]-hard with respect to $β$, making the existence of an EPTAS or FPT-algorithms unlikely. We show that they are W[1]-hard with respect to $β$ to approximate to any function of $β$. This means that unless FPT=W[1], there is no FPT-approximation scheme for the parameter $β$. The last problem (Max-SMTI) is known to be NP-hard to approximate to factor-29/33 and W[1]-hard with respect to the number of ties. We complement this and present an FPT-approximation scheme for the parameter "number of agents with ties".
Interview with Nisarg Shah: Understanding fairness in AI and machine learning
During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, and winner of the 2024 IJCAI Computers and Thought Award, Professor Nisarg Shah. I asked him about his research, the role of theory in machine learning research, fairness and safety guarantees, regulation, conference reviews, and advice for those just starting out on their research journey. Could you start by telling us about yourself, your career, and your education? Nisarg Shah (NS): I grew up in India and went to IIT Bombay for my undergraduate. Ever since then, I knew that I wanted to go into higher education and academia. I actually did do an industrial placement after my undergrad, and I got a job offer that was very lucrative and would have been more lucrative than doing a PhD. However, that [money] is not why I wanted to do my PhD. I wanted to do my PhD because I was genuinely curious about different questions in this field, and I wanted to study more about them and have fun while doing it.
Biocomputation: Moving Beyond Turing with Living Cellular Computers
It is a well-known story that theoretical computer science and biology have been drawing inspiration from each other for decades. While computer science has tried to mimic the functioning of living systems to develop computing models, including automata, artificial neural networks, and evolutionary algorithms, biology has used computing as a metaphor to explain the functioning of living systems.4 For example, biologists have used Boolean logic to conceptualize gene regulation since early 1970s, when Jacques Monod wrote the inspirational statement "… like the workings of computers."40 This article contends that information processing is the link between computer science and molecular biology. In computer science, a model of computation such as finite state machines or Turing machines defines how to generate output from a set of inputs and a set of rules or instructions.
Fast Parameterized Preprocessing for Polynomial-Time Solvable Graph Problems
A holy grail of theoretical computer science, with numerous fundamental implications to more applied areas of computing such as operations research and artificial intelligence, is the question of whether NP is equal to P. Much of modern technology is based on the so-called Cobham-Edmonds' thesis (named after Alan Cobham and Jack Edmonds), which states that algorithmic problems can be feasibly computed on a computational device only if they can be computed in polynomial time. In a nutshell: P means "feasible" while NP-hard means "infeasible" to compute. Moving from theory to practice, and looking at problems more realistically, the size of the exponent in a polynomial-time algorithm matters a lot. In the era of big data, when n is, say, the number of users of Facebook or Twitter, an Θ(n3)– or even Θ(n2)-time algorithm might be too slow. In such applications where the input size n is very large, any practically efficient algorithm that solves a problem to optimality can only afford a linear or almost linear-time complexity.
Universal Imitation Games
Alan Turing proposed in 1950 a framework called an imitation game to decide if a machine could think. Using mathematics developed largely after Turing -- category theory -- we analyze a broader class of universal imitation games (UIGs), which includes static, dynamic, and evolutionary games. In static games, the participants are in a steady state. In dynamic UIGs, "learner" participants are trying to imitate "teacher" participants over the long run. In evolutionary UIGs, the participants are competing against each other in an evolutionary game, and participants can go extinct and be replaced by others with higher fitness. We use the framework of category theory -- in particular, two influential results by Yoneda -- to characterize each type of imitation game. Universal properties in categories are defined by initial and final objects. We characterize dynamic UIGs where participants are learning by inductive inference as initial algebras over well-founded sets, and contrast them with participants learning by conductive inference over the final coalgebra of non-well-founded sets. We briefly discuss the extension of our categorical framework for UIGs to imitation games on quantum computers.
Larger Offspring Populations Help the $(1 + (\lambda, \lambda))$ Genetic Algorithm to Overcome the Noise
Ivanova, Alexandra, Antipov, Denis, Doerr, Benjamin
Evolutionary algorithms are known to be robust to noise in the evaluation of the fitness. In particular, larger offspring population sizes often lead to strong robustness. We analyze to what extent the $(1+(\lambda,\lambda))$ genetic algorithm is robust to noise. This algorithm also works with larger offspring population sizes, but an intermediate selection step and a non-standard use of crossover as repair mechanism could render this algorithm less robust than, e.g., the simple $(1+\lambda)$ evolutionary algorithm. Our experimental analysis on several classic benchmark problems shows that this difficulty does not arise. Surprisingly, in many situations this algorithm is even more robust to noise than the $(1+\lambda)$~EA.
Shtetl-Optimized » Blog Archive » My AI Safety Lecture for UT Effective Altruism
Two weeks ago, I gave a lecture setting out my current thoughts on AI safety, halfway through my year at OpenAI. I was asked to speak by UT Austin's Effective Altruist club. You can watch the lecture on YouTube here (I recommend 2x speed). The timing turned out to be weird, coming immediately after the worst disaster to hit the Effective Altruist movement in its history, as I acknowledged in the talk. I then spent 20 minutes taking questions. For those who (like me) prefer text over video, below I've produced an edited transcript, by starting with YouTube's automated transcript and then, well, editing it. Thank you so much for inviting me here. I do feel a little bit sheepish to be lecturing you about AI safety, as someone who's worked on this subject for all of five months. But this past spring, I accepted an extremely interesting opportunity to go on leave for a year to think about what theoretical computer science can do for AI safety. I'm doing this at OpenAI, which is one of the world's leading AI startups, based in San Francisco although I'm mostly working from Austin. Despite its name, OpenAI is famously not 100% open … so there are certain topics that I'm not allowed to talk about, like the capabilities of the very latest systems and whether or not they'll blow people's minds when released. By contrast, OpenAI is very happy for me to talk about AI safety: what it is and and what if anything can we do about it. So what I thought I'd do is to tell you a little bit about the specific projects that I've been working on at OpenAI, but also just, as an admitted newcomer, share some general thoughts about AI safety and how Effective Altruists might want to think about it. I'll try to leave plenty of time for discussion. Maybe I should mention that the thoughts that I'll tell you today are ones that, until last week, I had considered writing up for an essay contest run by something called the FTX Future Fund. Unfortunately, the FTX Future Fund no longer exists. It was founded by someone named Sam Bankman-Fried, whose a net worth went from 15 billion dollars to some negative number of dollars in the space of two days, in one of the biggest financial scandals in memory. This is obviously a calamity for the EA community, which had been counting on funding from this individual. I feel terrible about all the projects left in the lurch, to say nothing of FTX's customers. Let's start with this: raise your hand if you've tried GPT-3.
Welcome Back!
Computational sciences in the India region are going through an exciting time. While India has always had significant strength in theoretical computer science (CS), in recent years it has developed substantial presence and maturity in other, more applied areas of CS such as hardware and computer architecture, data science and artificial intelligence (AI), and cyber-security. Alongside pure research, there has been a significant push toward lab-to-field projects and technology transfer and deployment, creating broad impact to the region and beyond. Significant efforts have been made on the democratization of education through online courses, enabling the vast population to learn from a relatively limited number of available experts. All these activities have continued to bolster India's already strong IT industry and been a factor in the huge increase in the number of startups (under 1,000 in 2016 to over 60,000 in 2022a), with the number of unicorn startups reaching 100.b
Neuromorphic chips more energy efficient for deep learning
Neuromorphic chips have been endorsed in research showing that they are much more energy efficient at operating large deep learning networks than non-neuromorphic hardware. This may become important as AI adoption increases. The study was carried out by the Institute of Theoretical Computer Science at the Graz University of Technology (TU Graz) in Austria using Intel's Loihi 2 silicon, a second-generation experimental neuromorphic chip announced by Intel Labs last year that has about a million artificial neurons. Their research paper, "A Long Short-Term Memory for AI Applications in Spike-based Neuromorphic Hardware," published in Nature Machine Intelligence, claims that the Intel chips are up to 16 times more energy efficient in deep learning tasks than performing the same task on non-neuromorphic hardware. The hardware tested consisted of 32 Loihi chips.